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A general asymptotic framework is developed for studying consis- 
tency properties of principal component analysis (PC A) . Our frame- 
work includes several previously studied domains of asymptotics as 
special cases and allows one to investigate interesting connections and 
transitions among the various domains. More importantly, it enables 
us to investigate asymptotic scenarios that have not been considered 
before, and gain new insights into the consistency, subspace consis- 
tency and strong inconsistency regions of PCA and the boundaries 
among them. We also establish the corresponding convergence rate 
within each region. Under general spike covariance models, the di- 
mension (or the number of variables) discourages the consistency of 
PCA, while the sample size and spike information (the relative size of 
the population eigenvalues) encourages PCA consistency. Our frame- 
work nicely illustrates the relationship among these three types of 
information in terms of dimension, sample size and spike size, and 
rigorously characterizes how their relationships affect PCA consis- 
tency. 

1. Introduction. Principal Component Analysis (PCA) is an impor- 
tant visualization and dimension reduction tool which finds orthogonal di- 
rections reflecting maximal variation in the data. This allows the low di- 
mensional representation of data, by projecting data onto these directions. 
PCA is usually obtained by an eigen decomposition of the sample variance- 
covariance matrix of the data. Properties of the sample eigenvalues and 
eigenvectors have been analyzed under several domains of asymptotics. 

In this paper, we develop a general asymptotic framework to explore in- 
teresting transitions among the various asymptotic domains. The general 
framework includes the traditional asymptotic setups as special cases, which 
allows careful study of the connections among the various setups, and more 
importantly it investigates scenarios that have not been considered before, 
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and offers new insights into the consistency (in the sense that the angle 
between estimated and population eigen direction tends to 0, or the inner 
product tends to 1) and strong-inconsistency (where the angle tends to f, 
i.e., the inner product tends to 0) properties of PCA, along with some tech- 
nically challenging convergence rates. 

Existing asymptotic studies of PCA roughly fall into three domains: 

(a) the classical domain of asymptotics, under which the sample size 
n — > oo and the dimension d is fixed (hence the ratio 3 ~~ * °°)- F° r 
example, see [2, 3, 10, 13, 18]. 

(b) the random matrix theory domain, where both the sample size n 
and the dimension d increase to infinity, with the ratio ^ — )• c, a 
constant mostly assumed to be within (0,oo). Representative work 
includes [7, 12, 27, 31] from the statistical physics literature, as well 
as [4-6, 14, 15, 19, 21-23] from the statistics literature. 

(c) the high dimension low sample size (HDLSS) domain of asymp- 
totics, which is based on the limit, as the dimension d —> oo, with the 
sample size n being fixed (hence the ratio 4 — * 0). HDLSS asymptotics 
was originally studied by [8], and recently rediscovered by [11]. PCA 
has been studied using the HDLSS asymptotics by [1, 16]. 

PCA consistency and (strong) inconsistency, defined in terms of angles, 
are important properties that have been studied before. A common technical 
device is the spike covariance model, initially introduced by Johnstone [14]. 
This model has been used in this context by, for example, Nadler [21], John- 
stone and Lu [15], and Jung and Marron [16]. An interesting, more general 
model has been considered by Benaych-Georges and Nadakuditi [6]. 

Under the spike model, the first few eigenvalues are much larger than 
the others. A major point of the present paper is that there are three critical 
features whose relationships drive the consistency properties of PCA, namely 

(1) the sample information: the sample size n, which has a positive contri- 
bution to, i.e. encourages, the consistency of the sample eigenvectors. 

(2) the variable information: the dimension d, which has a negative contri- 
bution to, i.e. discourages, the consistency of the sample eigenvectors. 

(3) the spike information: the relative sizes of the several leading eigen- 
values, which also has a positive contribution to the consistency. 

Our general framework considers increasing sample size n, increasing di- 
mension d, and increasing spike information. It clearly characterizes how 
their relationships determine the regions of consistency and strong-inconsistency 
of PCA, along with the boundary between. Furthermore, in the single spike 
case, we explore behavior on the boundary. In addition, our theorems demon- 
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strate the transitions among the existing domains of asymptotics, and for 
the first time to the best of our knowledge, enable one to understand the con- 
nections among them. Note that the classical domain ((a) above) assumes 
increasing sample size n while fixing dimension d; the random matrix do- 
main ((b) above) assumes increasing sample size n and increasing dimension 
d, while fixing the spike information; the HDLSS domain ((c) above) fixes 
the sample size, and increases the dimension and the spike information; thus 
each of these three domains is a boundary case of our framework. Finally, 
our theorems also contain novel results on rates of convergence. 

Sections 3 and 4 formally state very general theorems for the single and 
multiple component spike models, respectively. For illustration purposes 
only, in this section we first consider Examples 1.1 and 1.2 under some 
strong assumptions, which provide intuitive insight regarding the much more 
general theory presented in Sections 3 and 4. 

For these two illustrative examples, the three types of information and 
their relationships can be mathematically quantified by two indices, namely 
the spike index a and the sample index 7. Within the context of these exam- 
ples, we point out the significant contributions of our results in comparison 
with existing results. The comparisons and connections are graphically il- 
lustrated in Figure 1 and discussed below. 

Example 1.1. (Single-component spike model) Assume that X%, . . . , X n 
are random sample vectors from a d-dimensional normal distribution N(0, X), 
where the sample size n ~ cf 7 (7 > is defined as the sample index) and the 
covariance matrix £ has the eigenvalues as 

Ai~cr,A 2 = --- = A d = l,a>0, 

where the constant a is defined as the spike index. 

Theorem 3.1, when applied to this example, suggests that the maximal 
sample eigenvector is consistent when a +7 > 1 (grey region in Figure 1(A)), 
and strongly inconsistent when < a+7 < 1 (white triangle in Figure 1(A)). 
Theorem 3.3 explored behavior on the diagonal boundary a + 7 = 1. These 
very general new results nicely connect with many existing ones: 

• Previous Results I - the classical domain: 

For this example, Theorem 1 of Anderson [2] implied that for fixed 
dimension d and finite eigenvalues, when the sample size n — > 00 (i.e. 
7 — > 00, the limit on the vertical axis), the maximal sample eigenvector 
is consistent. This case is the upper left corner of Figure 1(A). 

• Previous Results II - the random matrix domain: 
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(A) Single Spike - Example 1 .1 



(B) Multi Spike - Example 1.2 




Strong Inconsistency Cc 
Spike Index a (Jung and Marron (2009)) 



Consistency 



1 

Strong Inconsistency Subspace Consistency 

Spike Index a (Jung and Marron (2009)) 



Fig 1. General consistency and strong inconsistency regions for PC A, as a function of the 
spike index a and the sample index 7. Panel (A) - single spike model in Example 1.1: PC A 
is consistent on the grey region (a. + 7 > 1), strongly inconsistent on the white triangle 
(0 < a+7 < 1), and in-between consistency and strong inconsistency on the thick diagonal 
line (a + 7 = 1), including the two dots (a = 0, 7 = 1 and a = l,j — 0). Panel (B) - 
multiple spike model in Example 1.2: the first m sample PCs are consistent on the grey 
region (a + 7 > 1,7 > 0), subspace consistent on the dotted line segment (a > 1, 7 = 0) 
on the horizontal axis, and strongly inconsistent on the white triangle (0 < a + 7 < 1). 

(a) The results of Johnstone and Lu [15] appear on the vertical axis 
in Panel (A) where the spike index a = (as they fix the spike 
information): the first sample eigenvector is consistent when the 
sample index 7 > 1 and strongly inconsistent when 7 < 1. 

(b) Nadler [21 J explored the interesting boundary case of a = 0,7 = 1 
(i.e. ^ — > c for a constant c). This result appears in Panel (A) 
as the single solid circle 7 = 1 on the vertical axis. 

• Previous Results III - the HDLSS domain: 

(a) The theorems of Jung and Marron [16] are represented on the 
horizontal axis in Panel (A) when the sample index 7 = (as they 
fix the sample size): the maximal sample eigenvector is consistent 
with the first population eigenvector when the spike index a > 1 
and strongly inconsistent when a < 1. 

(b) Jung et al. [17] deeply explored limiting behavior at the boundary 
a = 1,7 = 0. This result appears in Panel (A) as the single solid 
circle a = 1 on the horizontal axis. 

• Our Results hence nicely connect existing domains of asymptotics, 
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and give a much more complete characterization for the regions of 
PCA consistency, subspace consistency, and strong inconsistency. We 
also investigate asymptotic properties of the other sample eigenvectors 
and all the sample eigenvalues. Furthermore, we provide a new general 
connection between Previous Results II (b) and III (b), by studying 
asymptotic properties on the boundary case - a + 7 = 1. 

Example 1.2. (Multiple- component spike model) Assume that the co- 
variance matrix S in Example 1.1 has the following eigenvalues 



where m is a finite positive integer, the constants Cj,j = l,--- ,m, are pos- 
itive and satisfy that Cj > Cj+i > 1, j = 1, • • • , m — 1. 

Theorem 4-1, when applied to this example, shows that the first m sam- 
ple eigenvectors are individually consistent with corresponding population 
eigenvectors when a + 7>l,7>0 (the grey region in Figure 1(B)), instead 
of being subspace consistent [16], and strongly inconsistent when a + 7 < 1 
(the white triangle in Panel (B)). This very general new result connects with 
many others in the existing literature: 

• Previous Results I - the classical domain: 

For this example, Theorem 1 of Anderson [2] implied that for fixed 
dimension d and finite eigenvalues, when the sample size n — > 00 (i.e. 
7 — > 00, the limit on the vertical axis), the first m sample eigenvec- 
tors are consistent, while the other sample eigenvectors are subspace 
consistent. This case is the upper left corner of Figure 1(B). 

• Previous Results II - the random matrix domain: 

Paul [23] explored asymptotic properties of the first m eigenvectors and 
eigenvalues in the interesting boundary case of a = 0,7 = 1, i.e., - — > 
c with c £ (0, 1). This result appears in Panel (B) as the solid circle 
7 = 1 on the vertical axis. Paul and Johnstone [24-] considered a similar 
framework but from a minimax risk analysis perspective. Nadler [21] 
and Johnstone and Lu [15] did not study multiple spike models. 

• Previous Results III - the HDLSS domain: 

The theorems of Jung and Marron [1 6] are valid on the horizontal axis 
in Panel (B) where the sample index 7 = 0. In particular, for this 
example, their results showed that the first m sample eigenvectors are 
not separable when the spike index a > 1 (the horizontal dotted red line 
segment), instead they are subspace consistent with their corresponding 




G 



DAN SHEN, HAIPENG SHEN AND J. S. MARRON 



population eigenvectors, and are strongly inconsistent when the spike 
index a < 1 (the horizontal solid line segment). They and Jung et 
al. [17] did not study the asymptotic behavior on the boundary - the 
single open circle (a = 1 , 7 = 0) on the horizontal axis. 
• Our Results cover the classical domain, and are stronger than what [16] 
obtained: the increasing sample size enables us to separate out the first 
few leading eigenvectors and characterize individual consistency, while 
only subspace consistency was obtained by [16]. 

The organization of the rest of the paper is as follows. Section 2 first in- 
troduces our notations and several relevant consistency concepts. Section 3 
then presents the theoretical results of single-component spike models, stat- 
ing the asymptotic properties of the sample eigenvalues and eigenvectors 
under our general framework. Section 3.1 first considers single-component 
spike models where the positive information and the negative information 
are unbalanced, and Section 3.2 then studies single-component spike models 
where the two types of information are balanced, i.e. the boundary case. Sec- 
tion 4 studies multiple-component spike models. For easy access to the main 
ideas, Section 4.1 first studies models with distinct eigenvalues, while Sec- 
tion 4.2 then considers models where the eigenvalues are grouped. Section 5 
contains some discussion about the asymptotic properties of PCA when 
some small eigenvalues equal to zero and the challenges of the non-Gaussian 
extension. Section 6 contains the technical proofs of the main theorems. 

2. Notations and Concepts. We now introduce some necessary no- 
tations, and define consistency concepts relevant for our asymptotic study. 

2.1. Notation. Let the population covariance matrix be S, whose eigen 
decomposition is 

S = UAU T , 

where A is the diagonal matrix of population eigenvalues Ai > A2 > • • • > A^, 
and U is the matrix of corresponding eigenvectors U = [u\, . . . , u^]. 

Assume that X±, . . . ,X n are i.i.d. d- dimensional normal distribution 
iV(£, S). Let X be the sample mean. As discussed in Paul [25], 

n n—1 

(2.1) y^iXi - X)(Xi - X) T has the same distribution as V] 

i=l i=l 

where Yi are i.i.d. N(0, S). Since the sample covariance matrix is location 
invariant, we assume without loss of generality (WLOG): 
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Assumption 2.1. Xi, . . . ,X n are a random sample from a d- dimensional 
normal distribution N(0, £). 

Denote the sample covariance matrix by £ = n~ 1 XX T , where X = 
[X\ , . . . , X n ] . Note that £ can also be decomposed as 

(2.2) £ = UAU T , 

where A is the diagonal matrix of sample eigenvalues Ai > A2 > • ■ • > and 
U is the matrix of corresponding sample eigenvectors where U = [u±, . . . , u d ]. 

Below we introduce asymptotic notations that will be used in our theo- 
retical studies. Assume that : n = 1, ... ,00} is a sequence of random 
variables, and {a n : n = 1, . . . , 00} is a sequence of constant values. 

• Denote £ n = o a . s (a n ) if liuin^oo^ = almost surely. 



< c almost surely, for some 



• Denote £ n = O a . s (a n ) if lim n _>, 
constant c > 0. 

• Denote ^ n ~ a n if c 2 < hm n ^ 00 ^- < lim^oo^ < c\ almost surely, 
for two constants c\ > C2 > 0. 

It is worth pointing out that for fixed sample size n and d — > 00, we will 
consider convergence in probability, instead of almost surely. Consequently, 
we modify the above asymptotic notations to the following: 

• Denote ^ d = o p (ad) if lim^oo |^ = in probability. 



Denote £ d = O p (a d ) if lim d . 



a-d 

'id 

a d 



< z in probability, where the 



random variable z satisfies P(0 < z < 00) = 1. 

Denote £ rf ~ a d if z 2 < lim^^^ < lim^oo^ < z\ in probab 

where the two random variables satisfy P(0 < z 2 < z\ < 00) = 1. 



In addition, we introduce the following notions to help understand the 
assumptions on the population eigenvalues in our theorems. Assume that 
{a k : k = 1, . . . , 00} and {b k : k = 1, . . . , 00} are two sequence of constant 
values, where k can stand for either n or d. 

• Denote a k >- b k if lim^oo^- < 1. 

• Denote a k 3> b k if limfc_ 5 . 00 - iL = 0. 

a k 

• Denote a k ~ b k if c 2 < lim ^^ ^ < lim^oo ^ < c\ for two constants 
ci > c 2 > 0. 

2.2. Concepts. Below we list three important concepts relevant for con- 
sistency and strong inconsistency, some of which are modified from the re- 
lated concepts given by Jung and Marron [16] and Shen et al. [28]. 
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Let Uj be any normalized sample based estimator of uj for j = 1, . . . , [n A 

d]. 

• Consistency with rate a n : The estimator Uj is consistent with its 
population counterpart Uj with the convergence rate a n if | < Uj ,Uj >\ = 



i 



1 + O a . s (a n ). For example, a n = 




• Strong inconsistency with rate a n : Uj is strongly inconsistent with 
Uj with the convergence rate a n if | < uj,Uj > \ = O a . s (a n ). 

Let H be an index set, e.g. H = {m + 1, • • • , d}. Define S = spanjiifc, k £ 
-ff} to be the linear span generated by {uk, k E H}. 

• Subspace consistency with rate a n : Uj, j £ H, is subspace consis- 
tent with S with convergence rate a n if 

(2.3) angle < Uj, S >= O a . s (a n ), 

where the angle between the estimator Uj and the subspace S is the 
angle between the estimator and its projection onto the subspace, see 
Jung and Marron [16]. For further clarification, we provide a graphical 
illustration of the angle in Section B of the supplement [29] . 

Note that for fixed sample size n and d — > oo, the consistency concepts can 
be modified by replacing "a ra " and "O a . s " with "ad" and "O p " respectively. 

3. Single component spike models. Below we state our main theo- 
rems for single-component spike models. In Section 3.1, we study the asymp- 
totic properties of PCA under our general framework in the unbalanced case, 
where either the positive or negative type of information dominates the 
other. In Section 3.2, we investigate the asymptotic properties of PCA in 
the more delicate balanced case, where neither the positive nor the negative 
information is dominant. 

3.1. Single component spike models, unbalanced case. 

3.1.1. Cases with increasing sample size n. We first state in Theorem 3.1 
one of our main theoretical results regarding PCA consistency under our 
general framework. We then offer several remarks in regards to the conditions 
of the theorem as well as the connection between our results and the earlier 
ones in the literature. 

To fix ideas, we assume the maximal eigenvalue Ai dominates the other 
eigenvalues. WLOG, we assume that as n — > oo or d — > oo, 



Assumption 3.1. Ai > A 2 ~ • • • ~ ~ 1. 
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As discussed in the Introduction, we consider the delicate balance among 
the positive sample information n, the positive spike information Ai, and 
the negative variable information d, and characterize the various PCA con- 
sistency and strong-inconsistency regions. 

Theorem 3.1 suggests that the asymptotic properties of the sample eigen- 
values and eigenvectors depend on the relative strength of the positive infor- 
mation and the negative information, as particularly measured by two ratios: 
and ^ . The value of determines whether the maximal sample eigen- 
value is separable from the other eigenvalues, and further determines the 
consistency of the maximal sample eigenvector. The value of ^ determines 
the strong inconsistency of the second and higher-order sample eigenvectors. 

The following discussion and the scenarios in Theorem 3.1 are arranged 
according to a decreasing amount of positive information: 

• Theorem 3.1(a): If the amount of positive information dominates the 
amount of negative information up to the maximal eigenvalue, i.e. 

— > 0, then the maximal sample eigenvector is consistent, and the 
other sample eigenvectors are subspace consistent; 

• Theorem 3.1(b): In addition, if the amount of negative information 
dominates the amount of positive information for the eigenvalues whose 
index are greater than 1, i.e. ^ — > oo, then the corresponding sample 
eigenvectors are strongly-inconsistent; 

• Theorem 3.1(c): On the other hand, if the amount of negative infor- 
mation always dominates, i.e. —¥ oo, then the sample eigenvalues 
are asymptotically indistinguishable, and the sample eigenvectors are 
strongly inconsistent. 

Theorem 3.1. Under Assumptions 2.1 and 3.1, as n — >■ oo, the follow- 
ing results hold. 

(a) then Ai/Ai ^ 1, Xj ^ * for 2 < j < [n A (d - 1)], 

and the other non-zero Xj = O as f~). In addition, u\ is consistent 

with rate (^xr) 5 > and the other Uj are subspace consistent with S = 

i 

d \ 2 



(b) If — > and ^ — > oo, then X\/X\ 1, and the rest of the non-zero 



span{uk, k > 2} with rate y^j^ 
d 

l 

Xj ~ ^. In addition, u\ is consistent with rate ^^f^ 2 , and the rest 

of the Uj are strongly inconsistent with rate (^-^-^ ■ 
(c) If — > oo, then the non-zero Xj ~ and the corresponding eigen- 
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vector Uj are strongly inconsistent with rate ("^r) 2 - 

Having stated the main results for single-component spike models, we now 
offer several remarks regarding the conditions assumed in Theorem 3.1 and 
the connections with the existing results about PCA consistency. 

• Theorem 3.1 remains valid under more general assumptions of the 
population eigenvalues. For example, if A2 — > • • • — > \d — > c with c 
being a constant, the condition Ai > A2 can be relaxed to that Ai is at 
least a constant away from A2 such that Ai >- A2. Hence, our results 
of the maximal sample eigenvector are the same as those obtained 
by Johnstone and Lu (2009) [15] for the models they considered. In 
addition, if ^ — > or 00, the statement u Xj ~ ^" in Theorem 3.1 can 
be replaced by "Aj c^". 

• In Theorem 3.1, the dimension al can be fixed. In addition, consider 
00 > Ai >~ A2 —)••••—)• Xd — > c (for a constant c), which corresponds to 
the classical asymptotic framework considered by Anderson (1963) [2]. 
Theorem 1 of Anderson (1963) [2] implied that the maximal sample 
eigenvector is consistent with its corresponding population eigenvector 
and the rest sample eigenvectors are subspace consistent with their cor- 
responding eigenvector, which is consistent with our Theorem 3.1(a). 

• Assuming fixed Ai and ^ — > c with c being a constant, Nadler [21], 
Johnstone and Lu [15] and Benaych-Georges and Nadakuditi [6] ob- 
tained the results in Previous Results II - the random matrix domain 
in Example 1.1, which indicate that, as n — > 00, the maximal sam- 
ple eigenvector ui is consistent when ^ — > 0, and inconsistent when 
~ — > 00. Our Theorem 3.1 includes this as a special case. In addition, 
Theorem 3.1 offers more than just relaxing the fixed Ai assumption: it 
characterizes how an increasing Ai interacts with the ratio -, derives 
the corresponding convergence rate, and also studies the asymptotic 
properties of the higher order sample eigenvalues and eigenvectors, all 
of which have not been investigated before. 

3.1.2. Cases with fixed n. Theorem 3.2 summarize the results for the 
fixed n cases (i.e. the HDLSS domain). In comparison with Jung and Mar- 
ron (2009) [16], we make more general assumptions on the population eigen- 
values, and obtain the corresponding convergence rate results. Define K = 



Theorem 3.2. Under Assumptions 2.1 and 3.1, for fixed n, as d —> 00, 
the following results hold. 
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(a) J/^—t-O, then X\/X\ A ^ , and the rest of the non-zero Xj/d A K . 
In addition, u\ is consistent with rate {j^^ > an d the rest of the Uj 

are strongly inconsistent with rate 

(b) If £ —7- oo, then the non-zero Xj/d A K, and the corresponding Uj 
are strongly inconsistent with rate (^f) 5 ; respectively. 

3.2. Single component spike models, balanced case. The theorems in the 
previous subsections characterize the asymptotic properties of PCA under 
our general framework when either positive information or negative infor- 
mation dominates the other one. We now consider the transient cases where 
positive information and negative information are balanced, i.e. of the same 
asymptotic order. We state in Theorem 3.3 the corresponding results for sin- 
gle component spike models, and then discuss the connection with existing 
results in Nadler (2008) [21] and Jung et al. (2010) [17]. 

Suppose that the maximal eigenvalue asymptotically dominates the other 
eigenvalues, all of which have the same limit, as in 

Assumption 3.2. as n — >■ oo, Ai » A2, \j — > c\, j = 2, • • • , d, for a constant c\. 



Note that Assumption 3.2 is more elegant than Assumption 3.1 in that 
it requires the higher order population eigenvalues to have the same limit. 
This ensures that there is a clear gap between signal and noise, and helps 
to derive the exact mathematical representation for the angle between the 
true eigenvectors and the corresponding estimates. 

Theorem 3.3. Under Assumptions 2.1 and 3.2, as n — > 00, if — > 
c G (0,oo), then the sample eigenvalues satisfy 



(3.1) 



Ai a - s , 1 1 

AT — > 1 + cc ^ 



{ gA.-^AcA, 2<j<[nAd]; 
in addition, the sample eigenvectors satisfy 



(3.2) 



|< Ul,1ll >| 2 = +O a . s (l), 

|< Uj, Uj >\ 2 = O a . s (§), 2 < j < [n A d]. 



Below we comment on the results of Theorem 3.3. 
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• (3.2) suggests that the limiting angle between the maximal sample 
eigenvector u\ and u\ is between and and each of the additional 

sample eigenvector iij is strongly inconsistent with rate (3) 5 - 

• Theorems 3.1 and 3.3 together completely characterizes the phase tran- 
sition behavior of the maximal sample eigenvector u\ as converges 
to a different limit: as n — > oo, u\ starts from being consistent when 

— > 0, to being in-between consistency and strong inconsistency 
(with the limiting angle between and ^) when — > c G (0, oo), 

and finally to being strongly inconsistent when — > oo. 

• The results nicely complement the existing results of Nadler [21] and 
Jung et al. [17]: [21] considered cases with a constant Ai and ~ — > c G 
(0, oo) as n — > oo, and derived the absolute inner product between u\ 

and m; [17] studied scenarios with fixed n and ^ > c E (0, oo) as 

d — >■ oo, and showed that the absolute inner product is random. 

• In the context of the illustrating Example 1.1, the results of [21] cor- 
respond to the point on the horizontal axis with a = and 7 = 1; the 
results of [17] are for the point on the vertical axis with a = 1 and 
7 = 0; finally, our results are for the solid line with a + 7 = 1, which 
separates the consistency and strong-inconsistency regions. 

4. Multiple component spike models. We consider multiple spike 
models with finite m(e [1, nAd]) dominating spikes. In Section 4.1, we study 
models where the dominating eigenvalues are distinct. In Section 4.2, we 
consider the cases where the eigenvalues are not all distinct, by introducing 
the concept of tiered eigenvalues. 

4.1. Multiple component spike models with distinct eigenvalues. 

4.1.1. Cases with increasing sample size n. WLOG, we assume that the 
first m population eigenvalues have different strength and dominate the rest 
population eigenvalues, which are asymptotically equivalent. 

Assumption 4.1. as n — > 00, Ai y • • • y X m > \ m +i ~ • • • ~ A d ~ 1. 

A useful quantity, for distinguishing the various cases among eigenvectors 
in the coming theorems, is 

Afe+i , , 
o; = maxi< fc < z — — , l = l,---,m. 

This lower bound on the consecutive relative gap among the first I eigen- 
values provides a critical measure of the separation between the l-th sample 
eigenvector and the first / — 1 sample eigenvectors. 
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Below we first state the main theoretical results in Theorem 4.1, and follow 
up with some remarks about the theorem conditions and the connections 
between the theorem and the existing results in the literature. 

Similar to Theorem 3.1, Theorem 4.1 states the asymptotic properties of 
the sample eigenvalues and eigenvectors in a trichotomous manner, separated 
by the size of -4- , which again measures the relative strength of the positive 
information and the negative information. The three scenarios below and in 
Theorem 4.1 are arranged in a decreasing order of the amount of the positive 
information: 

• Theorem 4.1(a): If the amount of positive information dominates the 

amount of negative information up to the mth spike, i.e. -4 > 0, then 

each of the first m sample eigenvector is consistent, and the additional 
ones are subspace consistent; 

• Theorem 4.1(b): Otherwise, if the amount of positive information dom- 
inates the amount of negative information only up to the hth spike 
(h G [l,m]), i.e. — > and n \ h+1 — > oo, then each of the first h 
sample eigenvector is consistent, and each of the remaining higher- 
order sample eigenvector is strongly-inconsistent; 

• Theorem 4.1(c): Finally, if the amount of negative information always 
dominates, i.e. — > oo, then the sample eigenvalues are asymp- 
totically indistinguishable, and the sample eigenvectors are strongly 
inconsistent. 

Theorem 4.1. Under Assumptions 2.1 and 4-1, as n —> oo, the follow- 
ing results hold. 

(a) If ^ ->■ 0, then Xj/Xj ^4 1 for 1 < j < m, Xj ~ J for m + 1 < 
j < [n A (d — m)], and the other non-zero Xj = O a . s (^). In addition, 

Uj are consistent with rate (aj V ^x~^ 5 for 1 < j < m, and the other 
Uj are subspace consistent with S = span{uk,k > m + 1} with rate 

n\m J 

(b) If there exists a constant h, 1 < h < m, such that — > and 
n ^ - — > oo, then Xj/Xj 1 for 1 < j < h, and the other non- 
zero Xj ~ ^. In addition, Uj are consistent with rate (^aj V ^jt^ 5 for 

i 

nA, \ 2 



1 < j < h, and the other Uj are strongly inconsistent with rate 
(c) If — > oo, then the non-zero Xj ~ ^, and the corresponding Uj 



are 
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strongly inconsistent with rate (^j 2 -) 2 - 



We now discuss the conditions needed in the theorem and how the results 
connect with the existing ones in the literature. 

• If m = 1, Theorem 4.1 becomes Theorem 3.1 for single spike models. 

• The conclusions in Theorem 4.1 remain valid under the assumption 
Ai >- ■ ■ ■ y X m >~ X m +i —>••••—>■ Xd c, as discussed after Theo- 
rem 3.1 . 

• In Theorem 4.1, consider fixed dimension d and oo > Ai >~ ■ ■ ■ >~ X m >~ 
Am+i — > • • • — > Ad — > c. Then, Theorem 4.1(a) is consistent with the 
classical results implied by Theorem 1 of Anderson [2]. 

• Considering fixed Ai, • • • , X m and ^ — > c, where c G (0, 1), Paul [23] 
obtained results that are applicable to Example 1.2 to obtain Previous 
Results II - the random matrix domain in . As one can see, our Theo- 
rem 4.1 relaxes the assumptions of - — > c 6 (0, 1) and that Ai, • • ■ , X m 
are fixed. In addition, we characterize how increasing Ai, • • • , X m inter- 
act with the ratio ^ along with the corresponding convergence rates, 
and study the asymptotic properties of the higher order sample eigen- 
values and eigenvectors, all of which have not been investigated before. 

4.1.2. Cases with fixed n. The following Theorem 4.2 considers cases 
with fixed n. The multiple spike condition in Assumption 4.1 now becomes 
that the first m population eigenvalues are of the different order and domi- 
nate the other population eigenvalues, which are asymptotically equivalent: 

Assumption 4.2. as d ->■ oo, Ai > • • • > A m > A m+ i ~ • • • ~ A^ ~ 1. 

Note that for fixed n and d — > oo, assuming Xj >- Aj+i can not asymp- 
totically separate the corresponding sample eigenvalues Xj and Aj+i. Thus, 
we need to replace Assumption 4.1 with Assumption 4.2 to asymptotically 

separate the first m sample eigenvalues. Define K = lim^oo — 3= " t t 1 3 . 

Theorem 4.2. Under Assumptions 2.1 and 4-2, for fixed n, as d —> oo, 
the following results hold. 

fa) If there exists a constant h, 1 < h < m, such that — > and > 

oo, then Xj/Xj A for 1 < j < h, and the rest of the non-zero 



Xj/d K. In addition, Uj are consistent 




1 < j < h, and the other Uj are strongly inconsistent with rate 
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(b) H 

->■ oo, then the non-zero Xj/d A K, and the corresponding Uj 
are strongly inconsistent with rate (^r) 5 - 

4.2. Multiple component spike models with tiered eigenvalues. We now 
consider models where the m eigenvalues can be grouped into r tiers, where 
the eigenvalues within the same tier are either the same or have the same 
limit or are of the same order, and the eigenvalues within different tiers have 
either different limits or are of different orders. 

4.2.1. Cases with increasing sample size n. To fix ideas, the first m eigen- 
values are grouped into r tiers where there are qi{> 0) eigenvalues in the Ith 
tier with Xw=i Qi = m - Define qo = 0, q r +i = d — Ya=i Qh an d the index set 
of the eigenvalues in the Ith tier as 

{l-l i-i i-i } 

k=0 k=0 k=0 ) 

Assume the eigenvalues in the Ith tier have the same limit <5/(> 0), i.e. 

Assumption 4.3. lim^oo^- = 1, j e Hi, I = 1, • • • , r. 

The above assumption suggests that it is impossible to separate the sam- 
ple eigenvectors whose indexes are in the same tier, and motives us to 
consider subspace consistency. In addition, we assume that the population 
eigenvalues from different tiers are asymptotically different and dominate 
the other population eigenvalues that are asymptotically equivalent: 

Assumption 4.4. as n -»• oo, Si >- ■■■ >- S r » \ m +i ~ • • • ~ A d ~ 1. 

Under the above setup, we have the following Theorem 4.3 which suggests 
that the eigenvalues with the same limit can not be consistently estimated 
individually; the corresponding eigenvector estimates are either subspace 
consistent with the linear space spanned by the eigenvectors, or strongly 
inconsistent. Similar to the earlier theorems, Theorem 4.3 is arranged ac- 
cording to a decreasing amount of positive information: 

• Theorem 4.3(a): If the amount of positive information dominates the 

amount of negative information up to the rth tier, i.e. s- 0, then 

the estimates for the eigenvectors in the first r tiers are subspace con- 
sistent, and the estimates for the rest are also subspace consistent (but) 
at a different rate; 
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• Theorem 4.3(b): Otherwise, if the amount of positive information dom- 
inates the amount of negative information only up to the hth tier 
{h G [1, r] ) , i.e. ^ — > and n sf +1 ~~ ^ °°; then the estimates for 
the eigenvectors in the first h tiers are subspace consistent, and the 
estimates for the rest eigenvectors are strongly- inconsistent; 

• Theorem 4.3(c): Finally, if the amount of negative information always 
dominates, i.e. — > oo, then the sample eigenvalues are asymp- 
totically indistinguishable, and the sample eigenvectors are strongly 
inconsistent. 

In this setting, one key to distinguishing the cases in the theorem is 
(4.2) ai = maxi< fc <i%ti, l = l,---,r, 

Ok 

where 5 r +i = 1, which measures the separation between the sample eigen- 
vectors in the Z-th tier and those in the first I — 1 tiers. Define the subspace 
5; = spanjufc, k G Hi} for / = 1, • • • , r + 1. 

Theorem 4.3. Under Assumptions 2.1, 4-3 and 4-4> as n ^ oo, the 
following results hold. 

(a) If J- 0, then Xj/Xj ^ 1 for 1 < j < m, Xj ~^/orm+l< 

j < [n A (d — m)\, and the rest of the non-zero Xj = O a . s (— ). In 

i 

addition, Uj are subspace consistent with Si with rate (ai V ^\ 2 for 

j G Hi, I = 1, • • • , r, and rate (a r V ^J-J 5 for j G H r+ \ respectively. 

(b) If there exists a constant h, 1 < h < r, such that -A > and —A > 

[ 7 J ' — — ' nd h nd h+1 

oo, then Xj/Xj 1 for j G Hi, I = 1, • • • , h, and the other non-zero 
Xj ~ -. In addition, Uj are subspace consistent with Si with rate 

ai V — r- I for j G Hi, I = 1, • • • , h, and the other Uj are strongly 



( 



i 

inconsistent with rate ( 



(c) If 

nSi ~ * 00 > then the non-zero Xj ~ — , and the corresponding Uj are 

d 



strongly inconsistent with rate (^4 1 ) 2 • 



The following comments can be made for the results of Theorem 4.3. 

• If each tier only contains one eigenvalue, i.e. qi = ■ ■ ■ = q r = 1, then 
Theorem 4.3 becomes Theorem 4.1. 
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• The cases covered by Theorem 4.3 were not studied by Paul (2007) [23], 
which required the eigenvalues to be individually estimable. 

• The asymptotic properties in Theorem 4.3 will not change under the 
assumption 5\ >~ ■ ■ ■ >~ S r >~ A m+ i —>••••—)• A^ — > c. 

• In Theorem 4.3, the dimension d can be fixed. In addition, suppose 
oo > 5\ y • • • >~ S r y- X m +i — >• A^ — > c and the eigenvalues 
satisfying (4.3). Then, the results of Theorem 4.3(a) are consistent 
with the classical asymptotic subspace consistency results implied by 
Theorem 1 of Anderson (1963) [2]. 

4.2.2. Cases with fixed n. Similar results can be obtained for the fixed 
n cases (i.e. the HDLSS domain) as summarized below in Theorem 4.4. For 
that, we assume that as d — > oo, the first m eigenvalues fall into r tiers, where 
the eigenvalues in the same tier are asymptotically equivalent, as stated in 
the following assumption: 

Assumption 4.5. Xj ~ Si, j e Hi, I = 1, • • • , r. 

Different from Assumption 4.3 for diverging sample size n, now with a 
fixed n, the eigenvalues within the same tier are assumed to be of the same 
order, rather than of the same limit when n increases to oo. As we will see 
below in Theorem 4.4, one can not separately estimate the eigenvalues of 
the same order when n is fixed, which is feasible with an increasing n as long 
as they do not have the same limit as previously shown in Theorem 4.3. 

In addition, we assume that the population eigenvalues from different 
tiers are of different orders and dominate the rest eigenvalues which are 
asymptotically equivalent: 

Assumption 4.6. as d — > oo, 5% > • • • » S r » A m+ i ~ • • • ~ A^ ~ 1. 

Note that for fixed n and d — > oo, the assumption Si >- can not 
guarantee asymptotic separation of the corresponding sample eigenvalues 
Aj for j £ Hi and Xj for j G Hi + \. Thus, we need to replace Assumption 4.4 
with Assumption 4.6 in order to asymptotically separate the first r subgroup 

sample eigenvalues. Define K = lim^oo — J ^J hl 3 ■ 

Theorem 4.4. Under Assumptions 2.1, 4-5 and 4-6, for fixed n, as d —> 
oo, the following results hold. 

fa) If there exists a constant h, 1 < h < r, such that -f- — > and > 

( 7 J ' — — ' °h o h+1 

oo, then Xj ~ Xj for j G Hi, I = 1, ••• ,h and the other non-zero 
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Xj/d A K. In addition, Uj are subspace consistent with Si with rate 



i 



(b) 




5. Discussion. Throughout the paper, we assume that the small eigen- 
values have the same order as 1, i.e. \ m +i ~ • • • ~ ~ 1. In fact, this is a 
convenient WLOG choice. Our results remain valid when these small eigen- 
values are not of the same order, and even when some of them are 0. For 
example, suppose A^+i = • • • = = for m + 1 < d\ < d. As shown 
in Section C of the supplementary material [29], the asymptotic properties 
of PCA are independent of the basis choice for the d-dimensional space. If 
the population eigenvectors Uj, j = 1, . . . , d, are chosen as the basis of the 
d-dimensional space, the population covariance matrix becomes 



and Okxi is the k-by-l zero matrix. Then, the asymptotic properties of PCA 
under the population covariance matrix £ is the same as those under the 
covariance matrix Ai. Therefore, we only need to replace the dimension d 
by the effective dimension d\, and all the earlier results can be obtained. 

It is interesting but challenging to extend our results to non-Gaussian 
cases. Under the non-Gaussian assumption, the distribution equivalence 
results in (2.1) will not hold in general. Thus, one challenge in the non- 
Gaussian case is that we can no longer assume the population mean £ = 
and have to study properties of the sample covariance matrix £ = (n — 
l)- 1 EiLiPQ - X){Xi - X) T . Note that X { - X for i = 1, • • • ,n are not 
independent, which causes considerable difficulty in deriving the asymptotic 
properties of X. Furthermore, Lemma 6.3 is fundamental for us to derive 
the asymptotic properties of PCA in the current paper, and extension of 
Lemma 6.3 to non-Gaussian cases will require a new technical approach. 

6. Proofs. We now provide detailed proofs for the general Theorem 4.3. 
To save space, proofs for Theorems 3.1, 3.2, 3.3, 4.1, 4.2, and 4.4 (which are 
often similar, and simpler) are provided in the supplement [29]. We first 
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provide some overview in Section 6.1 and list four lemmas in Section 6.2, 
and then prove the asymptotic properties of the sample eigenvalues and the 
sample eigenvectors in Sections 6.3 and 6.4, respectively. 

In this paper, we study the consistency and strong inconsistency of PCA 
through the angle or the inner product between a sample eigenvector and the 
corresponding population eigenvector. We first note that this angle has a nice 
invariance property: it doesn't depend on the specific choice of the basis for 
the d-dimensional space, as discussed in details in the supplement [29]. Given 
this invariance property, for the rest of the paper, we choose to use the pop- 
ulation eigenvectors Uj, j = 1, . . . , d, as the basis of the d-dimensional space, 
which is equivalent to assuming that X{, i = 1, . . . , n, has a d-dimensional 
normal distribution with mean zero and a diagonal covariance matrix as 
S = A = diag{Ai, . . . , A^}. This will simplify our mathematical analysis, see 
for example (6.9) and (6.10). 

We consider general cases where the first m eigenvalues are grouped into 
r tiers, and WLOG we assume that Ai = • • • = A 0l = <5i, • • ■ , A V r-i = 

£-,1=0 u + L 

■ ■ ■ = X m = 5r where qo = and qi are positive integers for I > 1. In addition, 
we assume that each ratio 5j/5i, where 1 < i < j < r, converges to a constant 
less than 1 as n — > oo. (The following arguments can be extended to cases 
where only the upper limits of the ratios exist as stated in the theorems, 
through taking a converging subsequence of the diverging sequence of n.) 

6.1. Overview. Our proof makes use of the connection between the sam- 
ple covariance matrix S and its dual matrix T,£>, which share the same 
nonzero eigenvalues. To fix ideas, define Zi = A~?Xi for i = 1, . . . , n. Then, 
the ZiS are i.i.d. standard d-dimensional normal distribution. Denote the 
jth entry of Zi as Zi for j = 1 , . . . , d. For a fixed j , define 



which are i.i.d. standard n-dimensional normal distribution. Note that the 
dual matrix can be expressed as 





which can be rewritten as the sum of two matrices as follows: 




A + B 



with 




j=m+l 
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The proof involves the following several steps. First, we study the asymp- 
totic properties of the eigenvalues of A and B in Lemmas 6.1 and 6.2, 
respectively. Then, the Wielandt's Inequality (Rao [26]), now restated as 
Lemma 6.4, enables us to establish the asymptotic properties of the eigen- 
values of the dual matrix in Section 6.3. Finally, we derive the asymptotic 
properties of the sample eigenvectors of S in Section 6.4. Some intuitive 
ideas are provided in the supplement [29] to help understanding the proof. 

6.2. Lemmas. We list four lemmas that are used in our proof. Lem- 
mas 6.1 and 6.2 are proven in our online supplement, the proofs of which 
need the following Lemma 6.3 that studies asymptotic properties of the 
largest and smallest eigenvalues of the Wishart distribution. 

Lemma 6.1. As n — >■ oo, the eigenvalues of the matrix A in (6.1) satisfy 

X k (A) ^ ^ fe = lj ... ?mj 

where \k{A) denotes the kth largest eigenvalue of the matrix A. 

Lemma 6.2. As n — > oo, the eigenvalues of the matrix B in (6.1) satisfy 

A fc (B)*2-, k = l,--- ,[nA(d-m)]. 
n 



Lemma 6.3. Suppose B = -V S V^ where V s is an m x s random ma- 
trix composed of i.i.d. standard normal random variables. As s — > oo and 
™->c£ [0, oo), the largest and smallest non-zero eigenvalues of B converge 
almost surely to (1 + \/c) 2 and (1 — \/c) 2 , respectively. 



Lemma 6.3 can be found in Geman [9] and Silverstein [30] for c 6 (0, oo). 
One can easily extend it to include c = through simple coupling arguments, 
as in (22) of Johnstone and Lu [15]. 

Lemma 6.4. (Wielandt's Inequality [26]). If A,B are m x m real sym- 
metric matrices, then for all k = 1, . . . , m, 



\ k {A) + \ m (B) 
X k+ i{A) + \ m -i{B) 

\ rn {A) + X k {B) 



< X k (A + B) < < 



{ X k (A) + Ai(B) 
A fc _!(,4) + X 2 (B) 

k Ai(A) + X k (B) 
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6.3. Asymptotic properties of the sample eigenvalues. We now study the 
asymptotic properties of the sample eigenvalues Xj , for j = 1 , • • • , [n A d] , 
which are the same as the eigenvalues of the dual matrix T,jy, denoted as 
X j (t D ) = X j {A + B). 

According to Lemma 6.4, we have that 

Xj(A) + X n (B) < Xj < Xj(A) + Xi(B), 

which suggests 

(6 . 2) X M) | x n( B ) < k < X M) , Mg) 

Aj A; A; A; A, 

In addition, note that Lemma 6.2 suggests that < Al " A(d ~ m)](i?) ~ 

and ^ ~ Below we consider three scenarios separately. 

First, if there exists h G [l,r] such that ^J- — >■ 0, then ^j- -4 0, for 
j G / = 1, • • • , /i, where Hi is the index set of the eigenvalues in the Ith. 
tier. Thus, we have 



la o\ A n{B) a .s , -^l(^) a.s ■ - rr l I 

(6.3) — ^ 0, and— > 0, j e H h l = 1,- ■ ■ ,h. 

The above (6.3), together with (6.2) and Lemma 6.1, leads to 

(6.4) ^^>l, jeH u l = l r .. ,h. 

Secondly, if ^J- — > oo, then we have d n and [n A (d — m)} = n. For 
j <E H[, I > h, Lemma 6.4 suggests that 

n n n ~ n n 

(6.5) -Xj(A) + -X n (B) < -Xj < -Xj(A) + -X\(B). 

Lemma 6.1, together with the condition ^ — > oo, suggests that ^Xj(A) 

for j G Hi and I > h. Using Lemma 6.2, we have that ^A n (-B) ~ 1 and 

^Xi(B) ~ 1. The above, combined with (6.5), suggests that 

d hl 

(6.6) Xj ~ ~, ^qi < j < [n Ad], a.s. 

n z=i 

Finally, if ^ > 0, (6.3) suggests that 1, 1 < j < m. In addition, 

Lemma 6.4 suggests that 

n n n ~ n n 

(6-7) ^A[ j+n „ nA ( d „ m) ](A) + j A [nA(d-m)]{B) < < ^J'(^) + j^ 1 ^' 
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Note that the rank of A is less than or equal to to and it means that for 
j > m, Arj +n _ nA M_ m )i (A) = Xj(A) = 0. Furthermore, from Lemma 6.2, we 
have that ^X\(B) ~ ^\ n /\(d-m)]{B) ~ S 1- Combining the above with (6.7), 



we have that Xj ~ -, m+1 < j < [nA(d-m)]. For j £ [[n/\(d— m)]+l, nAd], 
given that Aj < A[ nA(d _ m) ], it follows that A-,- = O a . s (^). 

The above arguments can be summarized as follows: if )• 0, 



(6i 



f a.s d 



1 < j < m, 

m + 1 < j < [n A (d — to)] , 



I Xj = O a . s (f ), [n A (d- m)] + 1 < j < [n A d]. 



Note that if [n A (d — to,)] + 1 > [nAd], the last term disappears. 

Combining (6.4), (6.6), and (6.8), we can get the asymptotic properties 
of sample eigenvalues in Theorem 4.3. 

6.4. Asymptotic properties of the sample eigenvectors. We first state two 
results that simplify the proof. As aforementioned, in light of the invariance 
property of the angle, we choose the population eigenvectors Uj, j = 1, . . . , d, 
as the basis of the d-dimensional space. It then follows that Uj = ej where 
the jth component of ej equals to 1 and all the other components equal to 
zero. This suggests that 



| < U j , Uj > | 



< Uj, ej >\ 



(6.9) 

and for any index set H, 

(6.10) cos [angle (v,j, span{nfc, k £ H})] 



u 



As a reminder, the population eigenvalues are grouped into r + 1 tiers and 
the index set of the eigenvalues in the Zth tier Hi is defined in (4.1). Define 

(6.11) Uk,i = (ui,j)i^H k ,jeHi, l<k,l<r + l. 

Then, the sample eigenvector matrix U can be rewritten as the following: 



U 



( #1,1 

U 2 A 



[Ul,U2, 



,Ud\ 



Ul, 2 
£>2,2 



r+1,1 



r+1, 2 



U2,r+1 



To derive the asymptotic properties of the sample eigenvectors Uj, we 
consider the three scenarios of Theorem 4.3 separately. 
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6.4.1. Scenario (b) in Theorem 4-3. Under this scenario, there exists a 

constant h E \l,r], such that -4 > and > oo. From (6.10), to show 

l ' J' ndh nd h+1 v /' 

the subspace consistency with Si and rate (^ai V 5 , we only need to show 
that 

(6.12) J2^h = ^ + o a . s (ai)VO a . s (^-), j eH u l = !,■■■ ,h, 

where, as defined in (4.2) in Section 4.2, a\ = m&xi<k<i^j^~ , I = 1, • • • ,f. 
Below we provide the proof for 1 = 1. The process is similar for I = 2, • • • ,h, 
which is omitted to save space. 

Note that for I = 1, the left hand side of (6.12) becomes the sum of 
squares of the column elements in the matrix Un (defined in (6.11)). Thus, 
to prove (6.12), we first show that this sum of squares converges to 1, and 
then establish the convergence rate a\ V ^J-. 

For the first step, let Z = (Z\, • • • , Z n ), where Z$ = A~?Xi as defined in 
Section 6.3. Denote S = A~^UA^ where U is the sample eigenvector matrix 
and A is the sample eigenvalue matrix defined in (2.2). We can show that 

SS T = \ZZ T . Considering the fc-th diagonal entry of the matrices on the 

_ i „ i 

two sides and noting that s^j = A fc 2 A?Ufcj, we have the following 

^ n d d 

i=i j=i j=i 



As shown earlier, A YH=\ z fk ^-> wri ich suggests that uf, ■ < & as n — > 

oo. Then, given that the asymptotic properties of Xj in (b) of Theorem 4.3 
(Section 6.3), it follows that 



~ 2 _ J O a .s(^) j G Hi, £ = !,■■• ,h, 



(6.14) " k - J J = £? =1 m + l,-,[n Ad]. 

In addition, the fcth diagonal entry of S T S is less than or equal to its 
largest eigenvalue, i.e. the largest eigenvalue of ^Z T Z. Hence, we have 

d d ^ 

(6.15) XjJ^K^h =z2 s h ^ A max (-^), j € H h I = 1, • • • , h. 

k=l k=l 



24 DAN SHEN, HAIPENG SHEN AND J. S. MARRON 

The cross product matrix Z 1 Z follows a standard n-dimensional Wishart 
W n (d, I) distribution with d degrees of freedom and identity covariance ma- 
trix, see e.g. Muirhead (1982, p82) [20]. Using Lemma 6.4, we have that 

(6.16) A ma x(-^ T ) -(-)• 

n n 

Using (6.4), (6.15), and (6.16), we have 

d , 

(6.17) Yl <i = O a .sH-), jeH h l = l,--- ,h. 

UAj 



k=m+l 



Note that Xj <C ^, for j = YjI=i 11 + 1> " ' ' m - This, together with (6.14) 
and (6.17), suggests that 

(6.18) ^ = 0as (^r)' jefl,,i = i,.-- ,h. 

Noting that A& = <5i, k £ Hi, and (6.13), we obtain that for fc £ Hi, 

i=l j=l jeH-y 3 %Hx 

(6.19) = $?(\ x - \ 1+1 ) J] u 2 kJ + ^r'v+i. 

In addition, from (6.4), we have 6± x (Ai — A gi +i) S^ 1 (5i — 82) = (1 — c), 
and ^ 1 A 9l +i = c + o a . s (l), where c = lim^oo^ < 1. 

Note that - Ya=1 z fk = + °a.s(l)- Combining the above with (6.19), we 
have that 

l + o . a (l) < {l-c)\\m in ^ 00 Y {L lj + c 

< (1 - c)lim n _>. 00 Y + c < 1 , 
jeHi 

which yields ^2j £Hl u\ ■ = 1 + o a . s (l), k G Hi. The above means that the 

sum of squares of the row elements of U\ \ converges to 1. Given that the 
sample eigenvectors all have norm 1, the sum of squares of the row or the 
column elements of U± 1 is less than or equal to 1. It then follows that the 
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sum of squares of the column elements of U\,\ converges to 1, which finishes 
the first step of the proof. 

For the second step of the proof, we need to establish the convergence 
rate a\ V ^ of the above sum of squares. Having shown that the sum of 
squares of the row elements of U\ \ converges to 1, it follows that the sum 
of squares of the row elements of U± 2 converges to 0. Furthermore, the sum 
of the squares of the column elements of U\ 2 converges to 0, as follows: 

(6.20) ^^. = 0^(1), j£H 2 . 

WLOG, we assume that || — >■ 0. (If the limit is greater than 0, we can com- 
bine the index sets H 2 and #3 together to check whether — >• converges 
to 0. If not, we keep combining the index sets together until the big jump 
appears.) Given that |^ — > 0, (6.14) and (6.18), it follows that 

(6.21) Y, *h = °*M), 3£H 2 . 

k£H 3 U---UH r+1 

From (6.20) and (6.21), we have that 

Y u 2 k j = i + o a . s (l), jeH 2 , 

k£H 2 

which means that the sum of squares of the column elements of U 2t2 also 
converges to 1. Again, since the sum of squares of the row or column elements 
of i/2,2 is less than or equal to 1, it follows that the sum of squares of the 
row elements of C/2,2 must converge to 1: 

(6.22) ^^. = 1 + 0^(1), k£H 2 . 

Given that Xj —> Xj = 5 2 , j G H 2 , and (6.22), it follows that, for k E H 2 , 

^ n d 

1 + o a . s (i) = ~Y z lk = K 1 Y h^h 
i=i j=i 

> 5 2 l Y h&kj + 5 2 l Y x ^h = 5 2 _1 Y x ^h + 1 + 

je-ffi jeH 2 jeH! 

which yields S^ 1 YljeHi ^""fej = °a.s(l), k £ H 2 . 
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For j G H\, we have that Xj Xj = Si; hence, it follows that 



which yields that 

(6-23) E fi *J=°a.-(f). J^i' 



In addition, from (6.14) and (6.18), we have 

(6-24) Yl ul j =o a ,( 5 f), 3^ Hi. 

keH 3 U-UH r +i 1 



From (6.23), (6.24) and > we have 

{-r) = l + 00.5(01) s , 

di ndi 



5 d 
Y «fcj = 1 + Oa. s (/) = 1 + o a . s (ai) V O a . s (— -), j e Hi, 



which suggests that the sum of squares of the column elements of Un con- 
verges to 1 with the convergence rate ai V ^J-, as stated in (6.12) for I = 1. 
The proof of (6.12) is similar for I = 2, • • • ,h. Thus, we have shown the 
subspace consistency portion of the results in Scenario (b). 

Finally, the strong inconsistency in Scenario (b) follows directly from (6.14) 
by setting k = j: 

\< uj, Uj >| 2 = u 2 jd = O a . s (^j-j > 3 = J2m + 1, • • • , [n A d\. 

^ ' 2=1 

Hence, we have finished the proof of Scenario (b) in Theorem 4.3. 

6.4.2. Scenario (a) in Theorem 4-3. Now for Scenario (a) where -4 > 0, 

then (6.12) in Section 6.4.1 becomes 

(6.25) = 1 + °<^( a ') V °a.»(4-)> j £ H h l = l,--- ,r. 



(6.25), together with similar arguments as in proving Scenario (b), leads to 

'A. 

s n8 r 



(6.26) Y ul d = l + o a . s (a r )\fO a . s (-^-), m + 1 < j < [n A d\. 



From (6.10), (6.25) and (6.26), we obtain the subspace consistency in Sce- 
nario (a). 
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6.4.3. Scenario (c) in Theorem 4-3. Finally, for Scenario (c) where — > 
0, the strong inconsistency in Theorem 4.3 follows from (6.14) by setting 
k = j. 

SUPPLEMENTARY MATERIAL 

Additional Proofs 

(http:/ /www. unc.edu/ dshen/PCA/PCASupplment.pdf). Detailed proofs are 
provided for Theorems 3.1, 3.2, 3.3, 4.1, 4.2, 4.4, and the necessary lemmas. 
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